Improved Lite Audio-Visual Speech Enhancement
نویسندگان
چکیده
Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce noise noisy signals. Recently, we proposed lite (LAVSE) algorithm car-driving scenario. Compared conventional AVSE systems, LAVSE requires less online computation some extent solves user privacy problem on facial data. In this study, extend improve its ability address three practical issues often encountered in implementing namely, additional cost processing data, asynchronization, low-quality The system is termed improved (iLAVSE), which convolutional recurrent neural network architecture core model. We evaluate iLAVSE Taiwan Mandarin with video dataset. Experimental results confirm compared can effectively overcome aforementioned performance. also suitable real-world scenarios, where high-quality sensors may not always be available.
منابع مشابه
Audio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also...
متن کاملAudio-visual enhancement of speech in noise.
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach--that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach ...
متن کاملInventory-Based Audio-Visual Speech Enhancement
In this paper we propose to combine audio-visual speech recognition with inventory-based speech synthesis for speech enhancement. Unlike traditional filtering-based speech enhancement, inventory-based speech synthesis avoids the usual trade-off between noise reduction and consequential speech distortion. For this purpose, the processed speech signal is composed from a given speech inventory whi...
متن کاملAudio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
In this paper, we introduce a non-linear enhancement technique called Audio-Visual Codebook Dependent Cepstral Normalization (AVCDCN) and we consider its use with both audio-only and audio-visual speech recognition. AVCDCN is inspired from CDCN [1] [2], an audio-only enhancement technique that approximates the non-linear effect of noise on speech with a piece-wise constant function. Our experim...
متن کاملIntroducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR condition...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2022
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2022.3153265